GLM Part6

Author

Murray Logan

Published

23/07/2023

1 Preparations

Load the necessary libraries

library(glmmTMB)       #for model fitting
library(car)           #for regression diagnostics
library(broom)         #for tidy output
library(ggfortify)     #for model diagnostics
library(DHARMa)        #for residual diagnostics
library(see)           #for plotting residuals
library(knitr)         #for kable
library(effects)       #for partial effects plots
library(ggeffects)     #for partial effects plots
library(emmeans)       #for estimating marginal means
library(modelr)        #for auxillary modelling functions
library(tidyverse)     #for data wrangling
library(lindia)        #for diagnostics of lm and glm
library(performance)   #for residuals diagnostics
library(sjPlot)        #for outputs
library(report)        #for reporting methods/results
library(easystats)     #framework for stats, modelling and visualisation
library(MASS)          #for negative binomials
library(MuMIn)         #for AICc
library(patchwork)     #for figure layouts

2 Scenario

An ecologist studying a rocky shore at Phillip Island, in southeastern Australia, was interested in how clumps of intertidal mussels are maintained (Quinn 1988). In particular, he wanted to know how densities of adult mussels affected recruitment of young individuals from the plankton. As with most marine invertebrates, recruitment is highly patchy in time, so he expected to find seasonal variation, and the interaction between season and density - whether effects of adult mussel density vary across seasons - was the aspect of most interest.

The data were collected from four seasons, and with two densities of adult mussels. The experiment consisted of clumps of adult mussels attached to the rocks. These clumps were then brought back to the laboratory, and the number of baby mussels recorded. There were 3-6 replicate clumps for each density and season combination.

Figure 1: Mussel
Table 1: Format of quinn.csv data files
SEASON DENSITY RECRUITS SQRTRECRUITS GROUP
Spring Low 15 3.87 SpringLow
.. .. .. .. ..
Spring High 11 3.32 SpringHigh
.. .. .. .. ..
Summer Low 21 4.58 SummerLow
.. .. .. .. ..
Summer High 34 5.83 SummerHigh
.. .. .. .. ..
Autumn Low 14 3.74 AutumnLow
.. .. .. .. ..
Table 2: Description of the variables in the quinn data file
SEASON Categorical listing of Season in which mussel clumps were collected ­ independent variable
DENSITY Categorical listing of the density of mussels within mussel clump ­ independent variable
RECRUITS The number of mussel recruits ­ response variable
SQRTRECRUITS Square root transformation of RECRUITS - needed to meet the test assumptions
GROUPS Categorical listing of Season/Density combinations - used for checking ANOVA assumptions

3 Read in the data

quinn <- read_csv("../public/data/quinn.csv", trim_ws = TRUE)
Rows: 42 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): SEASON, DENSITY, GROUP
dbl (2): RECRUITS, SQRTRECRUITS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Since we intend to model both SEASON and DENSITY as categorical variables, we need to explicitly declare them as factors.

quinn |> glimpse()
Rows: 42
Columns: 5
$ SEASON       <fct> Spring, Spring, Spring, Spring, Spring, Spring, Spring, S…
$ DENSITY      <fct> Low, Low, Low, Low, Low, High, High, High, High, High, Hi…
$ RECRUITS     <dbl> 15, 10, 13, 13, 5, 11, 10, 15, 10, 13, 1, 21, 31, 21, 18,…
$ SQRTRECRUITS <dbl> 3.872983, 3.162278, 3.605551, 3.605551, 2.236068, 3.31662…
$ GROUP        <chr> "SpringLow", "SpringLow", "SpringLow", "SpringLow", "Spri…
## Explore the first 6 rows of the data
quinn |> head()
# A tibble: 6 × 5
  SEASON DENSITY RECRUITS SQRTRECRUITS GROUP     
  <fct>  <fct>      <dbl>        <dbl> <chr>     
1 Spring Low           15         3.87 SpringLow 
2 Spring Low           10         3.16 SpringLow 
3 Spring Low           13         3.61 SpringLow 
4 Spring Low           13         3.61 SpringLow 
5 Spring Low            5         2.24 SpringLow 
6 Spring High          11         3.32 SpringHigh
quinn |> str()
tibble [42 × 5] (S3: tbl_df/tbl/data.frame)
 $ SEASON      : Factor w/ 4 levels "Spring","Summer",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ DENSITY     : Factor w/ 2 levels "High","Low": 2 2 2 2 2 1 1 1 1 1 ...
 $ RECRUITS    : num [1:42] 15 10 13 13 5 11 10 15 10 13 ...
 $ SQRTRECRUITS: num [1:42] 3.87 3.16 3.61 3.61 2.24 ...
 $ GROUP       : chr [1:42] "SpringLow" "SpringLow" "SpringLow" "SpringLow" ...
quinn |> datawizard::data_codebook()
quinn (42 rows and 5 variables, 5 shown)

ID | Name         | Type        | Missings |     Values |          N
---+--------------+-------------+----------+------------+-----------
1  | SEASON       | categorical | 0 (0.0%) |     Spring | 11 (26.2%)
   |              |             |          |     Summer | 12 (28.6%)
   |              |             |          |     Autumn | 10 (23.8%)
   |              |             |          |     Winter |  9 (21.4%)
---+--------------+-------------+----------+------------+-----------
2  | DENSITY      | categorical | 0 (0.0%) |       High | 24 (57.1%)
   |              |             |          |        Low | 18 (42.9%)
---+--------------+-------------+----------+------------+-----------
3  | RECRUITS     | numeric     | 0 (0.0%) |    [0, 69] |         42
---+--------------+-------------+----------+------------+-----------
4  | SQRTRECRUITS | numeric     | 0 (0.0%) |  [0, 8.31] |         42
---+--------------+-------------+----------+------------+-----------
5  | GROUP        | character   | 0 (0.0%) | AutumnHigh |  6 (14.3%)
   |              |             |          |  AutumnLow |  4 ( 9.5%)
   |              |             |          | SpringHigh |  6 (14.3%)
   |              |             |          |  SpringLow |  5 (11.9%)
   |              |             |          | SummerHigh |  6 (14.3%)
   |              |             |          |  SummerLow |  6 (14.3%)
   |              |             |          | WinterHigh |  6 (14.3%)
   |              |             |          |  WinterLow |  3 ( 7.1%)
--------------------------------------------------------------------

4 Exploratory data analysis

Model formula: \[ \begin{align} y_i &\sim{} \mathcal{Pois}(\lambda_i)\\ ln(\mu_i) &= \boldsymbol{\beta} \bf{X_i}\\[1em] \end{align} \]

where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and effects of season, density and their interaction on mussel recruitment.

ggplot(quinn, aes(y=RECRUITS, x=SEASON, fill=DENSITY)) +
     geom_boxplot()

Conclusions:

  • there is clear evidence of non-homogeneity of variance
  • specifically, there is evidence that the variance is related to the mean in that boxplots that are lower on the y-axis (low mean) also have lower variance (shorter boxplots)
  • this might be expected for count data and we might consider that a Poisson distribution (which assumes that mean and variance are equal - and thus related in a very specific way).

Lets mimic the effect of using a log link, by using log scaled y-axis.

ggplot(quinn, aes(y=RECRUITS, x=SEASON, fill=DENSITY)) +
  geom_boxplot() +
  scale_y_log10()
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

Conclusions:

  • that is an improvement

5 Fit the model

We actually have a couple of initial options:

  1. log transform the response variable (RECRUITS) and fit against a Gaussian distribution. If we do so, we would need to amend the response for cases when the response is zero (as the log of 0 is not defined). We could simply add 1 to each count before log transformed.

  2. fit against a Poisson distribution with a log link

I will include this, just for an illustration on how to fit such a model, we will not pursue it further.

quinn.glmG <- glm(log(RECRUITS+1) ~ DENSITY*SEASON, data=quinn, family=gaussian)
quinn.glm <- glm(RECRUITS ~ DENSITY*SEASON, data=quinn,
                  family=poisson(link='log'))

6 Model validation

quinn.glm %>% autoplot(which=1:6)

Conclusions:

  • most of these diagnostics seem ok, however, there is one observation (#34) that has a very high residual (and thus Cook’s D)
quinn.glm %>% performance::check_model()
Model has interaction terms. VIFs might be inflated.
  You may check multicollinearity among predictors of a model without
  interaction terms.

quinn.glm %>% performance::check_overdispersion()
# Overdispersion test

       dispersion ratio =   3.309
  Pearson's Chi-Squared = 112.497
                p-value = < 0.001
Overdispersion detected.
quinn.glm %>% performance::check_zeroinflation()
# Check for zero-inflation

   Observed zeros: 2
  Predicted zeros: 0
            Ratio: 0.00
Model is underfitting zeros (probable zero-inflation).

Conclusions:

  • there is evidence of over-dispersion
quinn.resid <- quinn.glm %>% simulateResiduals(plot=TRUE)

quinn.resid %>% testResiduals()

$uniformity

    Asymptotic one-sample Kolmogorov-Smirnov test

data:  simulationOutput$scaledResiduals
D = 0.19018, p-value = 0.09584
alternative hypothesis: two-sided


$dispersion

    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 3.1726, p-value < 2.2e-16
alternative hypothesis: two.sided


$outliers

    DHARMa bootstrapped outlier test

data:  simulationOutput
outliers at both margin(s) = 4, observations = 42, p-value < 2.2e-16
alternative hypothesis: two.sided
 percent confidence interval:
 0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.00928571428571429 ) 
                                         0.0952381 
$uniformity

    Asymptotic one-sample Kolmogorov-Smirnov test

data:  simulationOutput$scaledResiduals
D = 0.19018, p-value = 0.09584
alternative hypothesis: two-sided


$dispersion

    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 3.1726, p-value < 2.2e-16
alternative hypothesis: two.sided


$outliers

    DHARMa bootstrapped outlier test

data:  simulationOutput
outliers at both margin(s) = 4, observations = 42, p-value < 2.2e-16
alternative hypothesis: two.sided
 percent confidence interval:
 0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.00928571428571429 ) 
                                         0.0952381 
quinn.resid %>% testDispersion()


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 3.1726, p-value < 2.2e-16
alternative hypothesis: two.sided
quinn.resid %>% testZeroInflation()


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = 7.6923, p-value = 0.04
alternative hypothesis: two.sided
#testTemporalAutocorrelation(quinn.glm1)

Conclusions:

  • there is evidence of over-dispersion and outliers
  • there is also some evidence of zero-inflation
## goodness of fit
1-pchisq(quinn.glm$deviance, df=quinn.glm$df.residual)
[1] 2.457479e-12
## any evidence of overdispersion
quinn.glm$deviance/quinn.glm$df.residual
[1] 3.677366

Conclusions:

  • there is evidence of over-dispersion
quinn %>% group_by(SEASON, DENSITY) %>%
  summarise(Zeros = sum(RECRUITS==0), 
            Prop = Zeros/n(),
            Mean = mean(RECRUITS))
`summarise()` has grouped output by 'SEASON'. You can override using the
`.groups` argument.
# A tibble: 8 × 5
# Groups:   SEASON [4]
  SEASON DENSITY Zeros  Prop  Mean
  <fct>  <fct>   <int> <dbl> <dbl>
1 Spring High        0 0     10   
2 Spring Low         0 0     11.2 
3 Summer High        0 0     48.2 
4 Summer Low         0 0     22   
5 Autumn High        0 0     19.7 
6 Autumn Low         0 0     18.2 
7 Winter High        0 0      5.67
8 Winter Low         2 0.667  2.67
x <- rpois(100000,lambda = 2.67)
tab.1 <- table(x == 0)
tab.1/sum(tab.1)

  FALSE    TRUE 
0.92903 0.07097 
## OR,  over the entire data
## is this due to excessive zeros (zero-inflation)
tab <- table(quinn$RECRUITS == 0)
tab/sum(tab)

     FALSE       TRUE 
0.95238095 0.04761905 
## 5% is not many.. so it cant be zero-inflated
## how many 0's would we expect from a poisson distribution with a mean similar to our mean
mean(quinn$RECRUITS)
[1] 18.33333
x <- rpois(100000, lambda = mean(quinn$RECRUITS))
tab.1 <- table(x == 0)
tab.1/sum(tab.1)

FALSE 
    1 

Conclusion:

  • although at the level of the entire dataset, there is no evidence for excessive zeros, if we explore this separately within each SEASON by DENSITY combination, we see that for the Winter/Low density group, the proportion of zeros is very high.

7 Different model

In light of the discovery that the Poisson model was over-dispersed, we will explore a negative binomial model. Rather than assume that the variance is equal to the mean (dispersion of 1), the dispersion parameter is estimated. Negative binomial models are also able to account for some level of zero-inflation.

quinn.nb <- MASS::glm.nb(RECRUITS ~ DENSITY*SEASON, data=quinn)
quinn.nb %>% autoplot(which=1:6)

quinn.nb %>% performance::check_model()
Model has interaction terms. VIFs might be inflated.
  You may check multicollinearity among predictors of a model without
  interaction terms.

quinn.resid <- quinn.nb %>% simulateResiduals(plot=TRUE)

quinn.resid %>% testResiduals()

$uniformity

    Exact one-sample Kolmogorov-Smirnov test

data:  simulationOutput$scaledResiduals
D = 0.11213, p-value = 0.626
alternative hypothesis: two-sided


$dispersion

    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.78026, p-value = 0.624
alternative hypothesis: two.sided


$outliers

    DHARMa bootstrapped outlier test

data:  simulationOutput
outliers at both margin(s) = 1, observations = 42, p-value = 0.74
alternative hypothesis: two.sided
 percent confidence interval:
 0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.010952380952381 ) 
                                      0.02380952 
$uniformity

    Exact one-sample Kolmogorov-Smirnov test

data:  simulationOutput$scaledResiduals
D = 0.11213, p-value = 0.626
alternative hypothesis: two-sided


$dispersion

    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.78026, p-value = 0.624
alternative hypothesis: two.sided


$outliers

    DHARMa bootstrapped outlier test

data:  simulationOutput
outliers at both margin(s) = 1, observations = 42, p-value = 0.74
alternative hypothesis: two.sided
 percent confidence interval:
 0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.010952380952381 ) 
                                      0.02380952 
quinn.resid %>% testDispersion()


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.78026, p-value = 0.624
alternative hypothesis: two.sided
quinn.resid %>% testZeroInflation()


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = 6.1728, p-value = 0.104
alternative hypothesis: two.sided
## goodness of fit
1-pchisq(quinn.nb$deviance, df=quinn.nb$df.residual)
[1] 0.01311451
## any evidence of overdispersion
quinn.nb$deviance/quinn.nb$df.residual
[1] 1.614218
AICc(quinn.glm, quinn.nb)
          df     AICc
quinn.glm  8 324.4516
quinn.nb   9 298.7196

Conclusions:

  • on the basis of AICc, the negative binomial model would be considered better than the Poisson model.

Conclusions:

  • there is no evidence that the negative binomial model is over-dispersed or zero-inflated
  • all other diagnostics are also improved

8 Partial plots

quinn.nb %>% plot_model(type='eff',  terms=c('SEASON', 'DENSITY'))

quinn.nb %>% allEffects() %>% plot(multiline=TRUE, ci.style='bar')

quinn.nb %>% allEffects() %>% plot(multiline=TRUE, ci.style='bar', type='link')

quinn.nb %>% ggpredict(c('SEASON', 'DENSITY')) %>% plot()

quinn.nb %>% ggemmeans(~SEASON*DENSITY) %>% plot()

quinn.nb |> estimate_relation() |> plot()

quinn.nb |> estimate_relation(at = c("SEASON", "DENSITY")) |> plot()

9 Model investigation / hypothesis testing

Lets start with the estimated coefficients on the link (log) scale

quinn.nb %>% summary()

Call:
MASS::glm.nb(formula = RECRUITS ~ DENSITY * SEASON, data = quinn, 
    init.theta = 9.022467857, link = log)

Coefficients:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.3026     0.1875  12.283  < 2e-16 ***
DENSITYLow                0.1133     0.2742   0.413  0.67934    
SEASONSummer              1.5721     0.2389   6.581 4.69e-11 ***
SEASONAutumn              0.6763     0.2492   2.714  0.00664 ** 
SEASONWinter             -0.5680     0.2881  -1.971  0.04870 *  
DENSITYLow:SEASONSummer  -0.8970     0.3509  -2.556  0.01059 *  
DENSITYLow:SEASONAutumn  -0.1881     0.3788  -0.496  0.61955    
DENSITYLow:SEASONWinter  -0.8671     0.5338  -1.624  0.10432    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(9.0225) family taken to be 1)

    Null deviance: 183.269  on 41  degrees of freedom
Residual deviance:  54.883  on 34  degrees of freedom
AIC: 293.09

Number of Fisher Scoring iterations: 1

              Theta:  9.02 
          Std. Err.:  3.69 

 2 x log-likelihood:  -275.095 

Conclusions:

  • the intercept represents the estimated mean of the first combination of Season (Spring) and Density (High). On the link scale this is 2.3
  • the difference between Low and High adult density in spring is 0.11, although this is not significant
  • the difference between Spring and Summer for High adult density is 1.57
  • the difference between Spring and Autumn for High adult density is 0.68
  • the difference between Spring and Winter for High adult density is -0.57
  • if there were no interactions, we would expect the Low density Summer recruitment to be the additive of the main effects (Low and Summer). However, the modelled mean is 0.9 less than the additive effects would have expected. This value is significantly different to 0, indicating that there is evidence that the density effect in Summer is different to that in Spring.
  • the density effect in Autumn and Winter were not found to be significantly different from what you would expect from an additive model.
quinn.nb %>% tidy(conf.int=TRUE)
# A tibble: 8 × 7
  term                  estimate std.error statistic  p.value conf.low conf.high
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)              2.30      0.187    12.3   1.11e-34    1.94    2.67   
2 DENSITYLow               0.113     0.274     0.413 6.79e- 1   -0.424   0.652  
3 SEASONSummer             1.57      0.239     6.58  4.69e-11    1.11    2.04   
4 SEASONAutumn             0.676     0.249     2.71  6.64e- 3    0.189   1.17   
5 SEASONWinter            -0.568     0.288    -1.97  4.87e- 2   -1.14   -0.00675
6 DENSITYLow:SEASONSum…   -0.897     0.351    -2.56  1.06e- 2   -1.59   -0.209  
7 DENSITYLow:SEASONAut…   -0.188     0.379    -0.496 6.20e- 1   -0.930   0.556  
8 DENSITYLow:SEASONWin…   -0.867     0.534    -1.62  1.04e- 1   -1.95    0.154  

Now if we exponentiate to put these estimates on the response scale:

quinn.nb %>% tidy(conf.int=TRUE, exponentiate=TRUE)
# A tibble: 8 × 7
  term                  estimate std.error statistic  p.value conf.low conf.high
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)             10.0       0.187    12.3   1.11e-34    6.93     14.5  
2 DENSITYLow               1.12      0.274     0.413 6.79e- 1    0.654     1.92 
3 SEASONSummer             4.82      0.239     6.58  4.69e-11    3.02      7.72 
4 SEASONAutumn             1.97      0.249     2.71  6.64e- 3    1.21      3.21 
5 SEASONWinter             0.567     0.288    -1.97  4.87e- 2    0.320     0.993
6 DENSITYLow:SEASONSum…    0.408     0.351    -2.56  1.06e- 2    0.205     0.811
7 DENSITYLow:SEASONAut…    0.829     0.379    -0.496 6.20e- 1    0.394     1.74 
8 DENSITYLow:SEASONWin…    0.420     0.534    -1.62  1.04e- 1    0.142     1.17 

Conclusions:

  • the intercept represents the estimated mean of the first combination of Season (Spring) and Density (High). On the response scale this is 10
  • there is a 0.11 fold difference between High and Low adult density in spring, although this is not significant
  • there is a 1.57 fold difference between Summer and Spring for High adult density
  • there is a 0.68 fold difference between Autumn and Spring for High adult density
  • there is a -0.57 fold difference between Winter and Spring for High adult density
  • the modelled mean for Summer Low density is 0.9 fold different (less than half) that we would have expected in the absence of interactions
  • the density effect in Autumn and Winter were not found to be significantly different from what you would expect from an additive model.

If we compare the above to the over-dispersed Poisson, we see that the estimates are the same, but that the standard errors are underestimated and hence the confidence intervals are narrower (for over-dispersed model)

quinn.glm %>% tidy(conf.int=TRUE, exponentiate=TRUE)
# A tibble: 8 × 7
  term                  estimate std.error statistic  p.value conf.low conf.high
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)             10.0       0.129    17.8   3.73e-71    7.68     12.7  
2 DENSITYLow               1.12      0.186     0.610 5.42e- 1    0.777     1.61 
3 SEASONSummer             4.82      0.142    11.1   1.55e-28    3.68      6.42 
4 SEASONAutumn             1.97      0.159     4.27  1.99e- 5    1.45      2.70 
5 SEASONWinter             0.567     0.215    -2.65  8.15e- 3    0.368     0.857
6 DENSITYLow:SEASONSum…    0.408     0.213    -4.20  2.64e- 5    0.268     0.620
7 DENSITYLow:SEASONAut…    0.829     0.238    -0.790 4.30e- 1    0.519     1.32 
8 DENSITYLow:SEASONWin…    0.420     0.435    -1.99  4.61e- 2    0.169     0.945

10 Predictions

In order to tease apart the significant interaction(s), it might be useful to explore the effect of Density separately within each Season.

quinn.nb %>% emmeans(~DENSITY|SEASON) %>% pairs() %>% summary(infer=TRUE)
SEASON = Spring:
 contrast   estimate    SE  df asymp.LCL asymp.UCL z.ratio p.value
 High - Low  -0.1133 0.274 Inf    -0.651     0.424  -0.413  0.6793

SEASON = Summer:
 contrast   estimate    SE  df asymp.LCL asymp.UCL z.ratio p.value
 High - Low   0.7836 0.219 Inf     0.354     1.213   3.577  0.0003

SEASON = Autumn:
 contrast   estimate    SE  df asymp.LCL asymp.UCL z.ratio p.value
 High - Low   0.0748 0.261 Inf    -0.438     0.587   0.286  0.7749

SEASON = Winter:
 contrast   estimate    SE  df asymp.LCL asymp.UCL z.ratio p.value
 High - Low   0.7538 0.458 Inf    -0.144     1.652   1.646  0.0999

Results are given on the log (not the response) scale. 
Confidence level used: 0.95 

Conclusions:

  • there is no evidence of an effect of Density in either Spring, Autumn or Winter
  • in Summer, recruitment is 0.78 higher (on a log scale) in High Density populations than Low Density populations.

Or we could express these on a factor (fold/ratio) scale.

quinn.nb %>% emmeans(~DENSITY|SEASON, type='response') %>% pairs()
SEASON = Spring:
 contrast   ratio    SE  df null z.ratio p.value
 High / Low 0.893 0.245 Inf    1  -0.413  0.6793

SEASON = Summer:
 contrast   ratio    SE  df null z.ratio p.value
 High / Low 2.189 0.480 Inf    1   3.577  0.0003

SEASON = Autumn:
 contrast   ratio    SE  df null z.ratio p.value
 High / Low 1.078 0.282 Inf    1   0.286  0.7749

SEASON = Winter:
 contrast   ratio    SE  df null z.ratio p.value
 High / Low 2.125 0.973 Inf    1   1.646  0.0999

Tests are performed on the log scale 

Or perhaps even an absolute difference scale.

quinn.nb %>% emmeans(~DENSITY|SEASON, type='response') %>% regrid() %>% pairs()
SEASON = Spring:
 contrast   estimate   SE  df z.ratio p.value
 High - Low    -1.20 2.92 Inf  -0.411  0.6812

SEASON = Summer:
 contrast   estimate   SE  df z.ratio p.value
 High - Low    26.17 7.97 Inf   3.284  0.0010

SEASON = Autumn:
 contrast   estimate   SE  df z.ratio p.value
 High - Low     1.42 4.92 Inf   0.288  0.7734

SEASON = Winter:
 contrast   estimate   SE  df z.ratio p.value
 High - Low     3.00 1.64 Inf   1.829  0.0673

Conclusions:

  • there is no evidence of an effect of Density in either Spring, Autumn or Winter
  • in Summer, recruitment is 2.19 fold higher in High Density populations than Low Density populations.
quinn.nb %>% emmeans(~DENSITY|SEASON,  type='response') %>% pairs() %>% summary(infer=TRUE)
SEASON = Spring:
 contrast   ratio    SE  df asymp.LCL asymp.UCL null z.ratio p.value
 High / Low 0.893 0.245 Inf     0.522      1.53    1  -0.413  0.6793

SEASON = Summer:
 contrast   ratio    SE  df asymp.LCL asymp.UCL null z.ratio p.value
 High / Low 2.189 0.480 Inf     1.425      3.36    1   3.577  0.0003

SEASON = Autumn:
 contrast   ratio    SE  df asymp.LCL asymp.UCL null z.ratio p.value
 High / Low 1.078 0.282 Inf     0.646      1.80    1   0.286  0.7749

SEASON = Winter:
 contrast   ratio    SE  df asymp.LCL asymp.UCL null z.ratio p.value
 High / Low 2.125 0.973 Inf     0.866      5.22    1   1.646  0.0999

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 
Tests are performed on the log scale 

It might be useful to capture these pairwise contrasts so that we can graph them as a caterpillar plot.

In the following, I will present the effects on a log (based 2) axis. Such a scale presents doubling and halving (etc) equidistant from 1.

eff <- quinn.nb %>%
    emmeans(~DENSITY|SEASON,  type='response') %>%
    pairs() %>%
    summary(infer=TRUE) %>%
    as.data.frame

eff %>%
    ggplot(aes(y=ratio, x=SEASON)) +
    geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL)) +
    geom_text(aes(label=sprintf("p=%.2f", p.value)), nudge_x=0.25) +
    geom_hline(yintercept=1, linetype='dashed') +
    scale_x_discrete(name='') +
    scale_y_continuous(name='Density effect (High vs Low)', trans=scales::log2_trans(),
                       breaks=scales::breaks_log(base=2)) +
    coord_flip(ylim=c(0.25, 4)) +
    theme_classic()

11 Summary figures

As these summarise only involve categorical predictors, there is no need to define a prediction grid. For categorical predictors, the default grid will assume that you are interested in all the levels of the categorical predictors.

newdata <- emmeans(quinn.nb, ~DENSITY|SEASON, type = 'response') %>% as.data.frame
head(newdata)
 DENSITY SEASON response       SE  df asymp.LCL asymp.UCL
 High    Spring 10.00000 1.874542 Inf   6.92530  14.43980
 Low     Spring 11.20000 2.240673 Inf   7.56705  16.57713
 High    Summer 48.16667 7.133321 Inf  36.03185  64.38826
 Low     Summer 22.00000 3.550677 Inf  16.03406  30.18574
 High    Autumn 19.66667 3.228389 Inf  14.25612  27.13064
 Low     Autumn 18.25000 3.713650 Inf  12.24768  27.19392

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 
ggplot(newdata, aes(y = response, x = SEASON, fill = DENSITY)) +
    geom_pointrange(aes(ymin = asymp.LCL, ymax = asymp.UCL, shape = DENSITY),
                    position = position_dodge(width = 0.2)) +
    theme_classic() +
    theme(axis.title.x = element_blank(),
          legend.position = c(0.01,1), legend.justification = c(0,1)) +
    annotate(geom = 'text', x = 'Summer', y = 70, label = '*', size = 7) +
    scale_shape_manual(values = c(21, 22))

12 Zero-inflated model

Unfortunately, DHARMa and performance do not work with zeroinf() models.

library(pscl)
library(glmmTMB)
quinn.zip <- zeroinfl(RECRUITS ~ DENSITY*SEASON | 1, data=quinn,  dist='poisson')
quinn.zip <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~1, data=quinn,  family=poisson())
quinn.resid <- quinn.zip %>% simulateResiduals(plot=TRUE)
## The following does not work due to a lack of pearson residuals for glmmTMB
## quinn.zip %>% performance::check_overdispersion()

quinn.zip %>% summary()
 Family: poisson  ( log )
Formula:          RECRUITS ~ DENSITY * SEASON
Zero inflation:            ~1
Data: quinn

     AIC      BIC   logLik deviance df.resid 
   320.6    336.2   -151.3    302.6       33 


Conditional model:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.3026     0.1291  17.836  < 2e-16 ***
DENSITYLow                0.1133     0.1858   0.610  0.54191    
SEASONSummer              1.5721     0.1419  11.081  < 2e-16 ***
SEASONAutumn              0.6763     0.1586   4.266 1.99e-05 ***
SEASONWinter             -0.5680     0.2147  -2.646  0.00815 ** 
DENSITYLow:SEASONSummer  -0.8970     0.2134  -4.202 2.64e-05 ***
DENSITYLow:SEASONAutumn  -0.1881     0.2381  -0.790  0.42957    
DENSITYLow:SEASONWinter   0.2165     0.4534   0.478  0.63300    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.0037     0.7305  -4.112 3.92e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#tidy(quinn.zip,  conf.int=TRUE, exponentiate = TRUE)
exp(-3.0037)
[1] 0.0496032
## quinn.zip1 <- zeroinfl(RECRUITS ~ DENSITY*SEASON | SEASON, data=quinn,  dist='poisson')
quinn.zip <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~1, data=quinn,  family=poisson())
summary(quinn.zip)
 Family: poisson  ( log )
Formula:          RECRUITS ~ DENSITY * SEASON
Zero inflation:            ~1
Data: quinn

     AIC      BIC   logLik deviance df.resid 
   320.6    336.2   -151.3    302.6       33 


Conditional model:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.3026     0.1291  17.836  < 2e-16 ***
DENSITYLow                0.1133     0.1858   0.610  0.54191    
SEASONSummer              1.5721     0.1419  11.081  < 2e-16 ***
SEASONAutumn              0.6763     0.1586   4.266 1.99e-05 ***
SEASONWinter             -0.5680     0.2147  -2.646  0.00815 ** 
DENSITYLow:SEASONSummer  -0.8970     0.2134  -4.202 2.64e-05 ***
DENSITYLow:SEASONAutumn  -0.1881     0.2381  -0.790  0.42957    
DENSITYLow:SEASONWinter   0.2165     0.4534   0.478  0.63300    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.0037     0.7305  -4.112 3.92e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
quinn.resid <- quinn.zip %>% simulateResiduals(plot=TRUE)

quinn.resid %>% testZeroInflation()


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = 0.98232, p-value = 1
alternative hypothesis: two.sided
quinn.resid %>% testDispersion()


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 1.4123, p-value = 0.512
alternative hypothesis: two.sided
quinn.zip1 <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~SEASON, data=quinn,  family=poisson())
quinn.resid <- quinn.zip1 %>% simulateResiduals(plot=TRUE)

quinn.zip1 %>% summary()
 Family: poisson  ( log )
Formula:          RECRUITS ~ DENSITY * SEASON
Zero inflation:            ~SEASON
Data: quinn

     AIC      BIC   logLik deviance df.resid 
   320.0    340.9   -148.0    296.0       30 


Conditional model:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.3026     0.1291  17.836  < 2e-16 ***
DENSITYLow                0.1133     0.1858   0.610  0.54191    
SEASONSummer              1.5721     0.1419  11.081  < 2e-16 ***
SEASONAutumn              0.6763     0.1586   4.266 1.99e-05 ***
SEASONWinter             -0.5680     0.2147  -2.646  0.00815 ** 
DENSITYLow:SEASONSummer  -0.8970     0.2134  -4.202 2.64e-05 ***
DENSITYLow:SEASONAutumn  -0.1881     0.2381  -0.790  0.42957    
DENSITYLow:SEASONWinter   0.2291     0.4375   0.524  0.60045    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Zero-inflation model:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    -20.468   8393.486  -0.002    0.998
SEASONSummer    -3.275  42171.914   0.000    1.000
SEASONAutumn    -3.527  52021.072   0.000    1.000
SEASONWinter    19.214   8393.486   0.002    0.998
plogis(-20.468)
[1] 1.290805e-09
plogis(-20.468 + 19.214)
[1] 0.2220085
## quinn.zinb <- zeroinfl(RECRUITS ~ DENSITY*SEASON | 1, data=quinn,  dist='negbin')
quinn.zinb <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~SEASON, data=quinn,  family=nbinom2())
quinn.resid <- quinn.zinb %>% simulateResiduals(plot=TRUE)

AICc(quinn.zip,  quinn.zip1, quinn.zinb)
           df     AICc
quinn.zip   9 326.1881
quinn.zip1 12 330.7988
quinn.zinb 13 309.2125
quinn.zinb %>% summary()
 Family: nbinom2  ( log )
Formula:          RECRUITS ~ DENSITY * SEASON
Zero inflation:            ~SEASON
Data: quinn

     AIC      BIC   logLik deviance df.resid 
   296.2    318.8   -135.1    270.2       29 


Dispersion parameter for nbinom2 family ():   11 

Conditional model:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.3026     0.1784  12.907  < 2e-16 ***
DENSITYLow                0.1133     0.2605   0.435  0.66355    
SEASONSummer              1.5721     0.2246   7.000 2.56e-12 ***
SEASONAutumn              0.6763     0.2355   2.872  0.00408 ** 
SEASONWinter             -0.5680     0.2764  -2.055  0.03988 *  
DENSITYLow:SEASONSummer  -0.8970     0.3305  -2.714  0.00665 ** 
DENSITYLow:SEASONAutumn  -0.1881     0.3577  -0.526  0.59899    
DENSITYLow:SEASONWinter   0.2130     0.5887   0.362  0.71757    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Zero-inflation model:
               Estimate Std. Error z value Pr(>|z|)
(Intercept)     -20.772   9771.312  -0.002    0.998
SEASONSummer     -6.012 189293.632   0.000    1.000
SEASONAutumn     -5.942 200192.420   0.000    1.000
SEASONWinter     19.507   9771.312   0.002    0.998

13 References

Quinn, G. P. 1988. “Ecology of the Intertidal Pulmonate Limpet Siphonaria Diemenensis Quoy Et Gaimard. II Reproductive Patterns and Energetics.” Journalofexperimentalmarinebiologyandecology 117: 137–56.